Requirements (CRUD)

Functional

1. User can Read the feed.
2. User can Create the feed(images, text, videos)

Non Functional

1. Update the feed
2. Delete the item from feed
3. News feed keep on extending when it gets over

Back of Envelope

      World Population      | InternetUsers(60%)    | FB Users(80-85% of Internet users) | Active FB Users(20%)
      ----------------------|-----------------------|------------------------------------|--------------------
      7 Billion //Year 2020 | 7 x 0.6 = 4.2 Billion |    4.2 x 0.8 = 3.5 Billion         | 3.5 x .2 = 700 Million
    

Traffic Estimates (Incoming Requests/sec)

      700 Million pulling their new feed 5 times a day. Total New feed requests = 700 x 5 = 3500 Million requests/day.
      3500/24x60x60 = Around 39,000 Requests/sec.
    

Storage Estimates

      Cache/CDN
      Suppose We want to keep 500 posts/user on CDN for quick fetch.
      Total posts to be stored. 500 x 700 Million = 350 Billion.
      Let's assume on average each posts is 1 KB in size. Total bytes = 350 TB
      Assuming 1 server can cache is 100GB. Total servers needed = 350 TB / 100 GB = 3500 Machines needed in total
    

API Design

1. Create Record


POST https://url/v1/post_feed
header {
    Authorization: {Bearer "API_KEY_TOKEN"},
}
body {
    heading, description, picture_url, video_url
}
      

2. Reading the Feed


GET HTTP/2 https://url/v1/read_feed
header {
    Authorization: {Bearer "API_KEY_TOKEN"},
}
body {
    to be filled
}
      

DB Schema

SQL DB

      User information, password, comments can be stored in seperate tables.
      Likes to Comments can be stored in seperate table and comments table can store like_id.
      

User Table                                                    Password Table
| username | user_id(pk) | created_at | enabled |             | user_id(pk) | password_hash | created_at | expiry_at |

Comments Table: Comment can have text,video,image,@,#,likes,shares
| comment_id (pk) | text_id(fk) | user_id (fk) | media-id(fk) | created_at | like_id(fk) | share_id(fk) | ...
      id1                t1           u1            m1              -             l1          s1

Text Table                                likes Table                                               Media Table (Photo, text, video)
| key=text_id (pk) | value=text url |     | key=like_id (pk) | value=file url storing all likes |   | key=text_id (pk) | value=text url |
        t1              hello                   l1                (user2, user3, user4 ..)

>> Friends & Friends of Friends stored in Graph DB
      
Problems in Above design (likes table)
1. Race Condition, when number of likes on comment grows in Millions
We will keep appending userid's on in same file (ie same row). if new like comes in DB has to read entire line then add userid at end and update table.
This can cause race condition (if 100's of users try to like same comment at same time).
Solution-1 (Key-Value DB: Redis)
Seperate row for each like. Key-Value DBs can handle huge datasets with fast lookups
Precomputing the number of likes.

likes Table
SADD comment_id1 user1
SADD comment_id1 user2
SADD comment_id1 user3
SADD comment_id2 user2
SREM comment_id1 user2   # Unlike
SCARD comment:id1:likes        # Get like count
      
Solution-2 (Graph DB(neoj4))
In a graph database, you can model User and Comment as nodes and the LIKES relationship as an edge.
Likes = Number of edges

        user-1    ----authored-->     comment-1
                                        /\
        user2 ----------liked------------|
      

HLD

JWT based authentication to facebook.com

POST Message

The system limits the number of posts a user can make within a certain period, vital to prevent spam and abusive content
post comment

GET Newsfeed

Push Model Pull Model
What Whenever someone posts a item(text, video, audio), Generate newsfeed for all friends/followers and send to friend/follower. Client side browser caches it and delivers when client comes online Only Generate feed whenever request to retrieve comes from client
Advantage, Disadvatange Adv: Newfeed is calculated in advance, which leads to very less delay on client to read news feed
Disadv: For inactive users/friends also newfeed is calculated. More resources are used upfront
Adv: Newfeed is calculated for active users only, which leads to low resource usage
Disadv: Client can get significant latency in receiving newsfeed as its calculated when client asks
Usecase we use a push model for the majority of users Celebrity Case: Whenever a celebrity posts a message. Feed for its followers is calculated and delivered.
What we will do?
We will use both of approaches based on the situation. See usecase in above table
Newfeed Architecture
Get Newsfeed